{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "978146e2",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/llm/openvino.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f717d3d4-942b-4d86-9435-fc44b3ac6d39",
   "metadata": {},
   "source": [
    "# OpenVINO GenAI LLMs\n",
    "\n",
    "[OpenVINO™](https://github.com/openvinotoolkit/openvino) is an open-source toolkit for optimizing and deploying AI inference. OpenVINO™ Runtime can enable running the same model optimized across various hardware [devices](https://github.com/openvinotoolkit/openvino?tab=readme-ov-file#supported-hardware-matrix). Accelerate your deep learning performance across use cases like: language + LLMs, computer vision, automatic speech recognition, and more.\n",
    "\n",
    "`OpenVINOGenAILLM` is a wrapper of [OpenVINO-GenAI API](https://github.com/openvinotoolkit/openvino.genai). OpenVINO models can be run locally through this entitiy wrapped by LlamaIndex :"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90cf0f2e-8d8d-4e42-81bf-866c759221e1",
   "metadata": {},
   "source": [
    "In the below line, we install the packages necessary for this demo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f413f179",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-llms-openvino-genai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "29e5904b-ec39-4add-9292-f56e37047324",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install optimum[openvino]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dac8f9f-7136-43f7-9e9f-de679e74d66e",
   "metadata": {},
   "source": [
    "Now that we're set up, let's play around:"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2c577674",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86028752",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0465029c-fe69-454a-9561-55f7a382b2e2",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home2/ethan/intel/llama_index/llama_test/lib/python3.10/site-packages/pydantic/_internal/_fields.py:132: UserWarning: Field \"model_path\" in OpenVINOGenAILLM has conflict with protected namespace \"model_\".\n",
      "\n",
      "You may be able to resolve this warning by setting `model_config['protected_namespaces'] = ()`.\n",
      "  warnings.warn(\n"
     ]
    }
   ],
   "source": [
    "from llama_index.llms.openvino_genai import OpenVINOGenAILLM"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3e21cef-b3c3-4ddd-a70c-728de440648e",
   "metadata": {},
   "source": [
    "### Model Exporting\n",
    "\n",
    "It is possible to [export your model](https://github.com/huggingface/optimum-intel?tab=readme-ov-file#export) to the OpenVINO IR format with the CLI, and load the model from local folder."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a27feba3-d027-4d10-b1af-1e130e764a67",
   "metadata": {},
   "outputs": [],
   "source": [
    "!optimum-cli export openvino --model microsoft/Phi-3-mini-4k-instruct --task text-generation-with-past --weight-format int4 model_path"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ad6b5a5-2e87-4d28-9542-55cbd7b61932",
   "metadata": {},
   "source": [
    "You can download a optimized IR model from OpenVINO model hub of Hugging Face."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3ff1f97-ad27-4e9a-ac7b-7eff004748e0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "764beb4397df45658ee5d6e5a0dd3902",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Fetching 17 files:   0%|          | 0/17 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "'/home2/ethan/intel/llama_index/docs/examples/llm/Phi-3-mini-4k-instruct-int4-ov'"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import huggingface_hub as hf_hub\n",
    "\n",
    "model_id = \"OpenVINO/Phi-3-mini-4k-instruct-int4-ov\"\n",
    "model_path = \"Phi-3-mini-4k-instruct-int4-ov\"\n",
    "\n",
    "hf_hub.snapshot_download(model_id, local_dir=model_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "786bdeba-71a5-4454-ae80-82e4db24a58f",
   "metadata": {},
   "source": [
    "### Model Loading\n",
    "\n",
    "Models can be loaded by specifying the model parameters using the `OpenVINOGenAILLM` method.\n",
    "\n",
    "If you have an Intel GPU, you can specify `device=\"gpu\"` to run inference on it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d064eff0-f1dd-49cd-857b-d963a3d1813e",
   "metadata": {},
   "outputs": [],
   "source": [
    "ov_llm = OpenVINOGenAILLM(\n",
    "    model_path=model_path,\n",
    "    device=\"CPU\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75569a58-a32c-4e48-9066-3c5278ced488",
   "metadata": {},
   "source": [
    "You can pass the generation config parameters through `ov_llm.config`. The supported parameters are listed at the [openvino_genai.GenerationConfig](https://docs.openvino.ai/2024/api/genai_api/_autosummary/openvino_genai.GenerationConfig.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a718899a-876b-4dd1-ab6f-25feac4d1f71",
   "metadata": {},
   "outputs": [],
   "source": [
    "ov_llm.config.max_new_tokens = 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e25c7162",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "# Answer\n",
      "The meaning of life is a profound and complex question that has been debated by philosophers, theologians, scientists, and thinkers throughout history. Different cultures, religions, and individuals have their own interpretations and beliefs about what gives life purpose and significance.\n",
      "\n",
      "From a philosophical standpoint, existentialists like Jean-Paul Sartre and Albert Camus have argued that life inherently has no meaning, and it is\n"
     ]
    }
   ],
   "source": [
    "response = ov_llm.complete(\"What is the meaning of life?\")\n",
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dda1be10",
   "metadata": {},
   "source": [
    "### Streaming\n",
    "\n",
    "Using `stream_complete` endpoint "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12e0f3c0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "\n",
      "Paul Graham is a computer scientist and entrepreneur who is best known for founding the startup accelerator program Y Combinator. He is also the founder of the web development company Viaweb, which was acquired by PayPal for $497 million in 1raneworks.\n",
      "\n",
      "What is Y Combinator?\n",
      "\n",
      "Y Combinator is a startup accelerator program that provides funding, mentorship, and resources to early-stage start"
     ]
    }
   ],
   "source": [
    "response = ov_llm.stream_complete(\"Who is Paul Graham?\")\n",
    "for r in response:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2c87c383",
   "metadata": {},
   "source": [
    "Using `stream_chat` endpoint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2db801a8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "I'm Phi, Microsoft's AI assistant. How can I assist you today?"
     ]
    }
   ],
   "source": [
    "from llama_index.core.llms import ChatMessage\n",
    "\n",
    "messages = [\n",
    "    ChatMessage(\n",
    "        role=\"system\", content=\"You are a pirate with a colorful personality\"\n",
    "    ),\n",
    "    ChatMessage(role=\"user\", content=\"What is your name\"),\n",
    "]\n",
    "resp = ov_llm.stream_chat(messages)\n",
    "\n",
    "for r in resp:\n",
    "    print(r.delta, end=\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3fa723d6-4308-4d94-9609-8c51ce8184c3",
   "metadata": {},
   "source": [
    "For more information refer to:\n",
    "\n",
    "* [OpenVINO LLM guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide.html).\n",
    "\n",
    "* [OpenVINO Documentation](https://docs.openvino.ai/2024/home.html).\n",
    "\n",
    "* [OpenVINO Get Started Guide](https://www.intel.com/content/www/us/en/content-details/819067/openvino-get-started-guide.html).\n",
    "\n",
    "* [RAG example with LlamaIndex](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-rag-llamaindex)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
